arXiv:1508.00506v4 [math.OC] 25 Oct 2016 


Journal of Machine Learning Research 17 (2016) 1-37 


Submitted 2/16; Revised 7/16; Published 9/16 


A Variational Approach to Path Estimation and Parameter 
Inference of Hidden Diffusion Processes 


Tobias Sutter sutter@control.ee.ethz.ch 

Department of Electrical Engineering and Information Technology 
ETH Zurich, Switzerland 

Arnab Ganguly aganguly@lsu.edu 

Department of Mathematics 
Louisiana State University, USA 

Heinz Koeppl heinz.koeppl@bcs.tu-darmstadt.de 

Department of Electrical Engineering and Information Technology 
TU Darmstadt, Germany 


Editor: Manfred Opper 


Abstract 

We consider a hidden Markov model, where the signal process, given by a diffusion, is only 
indirectly observed through some noisy measurements. The article develops a variational 
method for approximating the hidden states of the signal process given the full set of 
observations. This, in particular, leads to systematic approximations of the smoothing 
densities of the signal process. The paper then demonstrates how an efficient inference 
scheme, based on this variational approach to the approximation of the hidden states, can 
be designed to estimate the unknown parameters of stochastic differential equations. Two 
examples at the end illustrate the efficacy and the accuracy of the presented method. 
Keywords: Variational inference, stochastic differential equations, diffusion processes, 
hidden Markov model, optimal control 

1. Introduction 

Diffusion processes modeled by stochastic differential equations (SDEs) appear in several 
disciplines varying from mathematical finance to systems biology. For example, in systems 
biology stochastic differential equations are used for efficient modeling of the states of the 
chemical species in a reaction system when they are present in high abundance Wilkinson 
(2006). Oftentimes, the state of the system or the signal process is not directly observed, 
and inference of the state trajectories and parameter of the system has to be achieved 
based on noisy partial observations. Typically, in such a scenario, the observation data is 
conveniently modeled as a function of the hidden state corrupted with independent addi¬ 
tive noise. However, generalizations of this basic setup, which, for example, could include 
stronger coupling between the hidden signal and the observation processes, are often used 
for modeling more complex phenomena. 

In such a model optimal filtering theory concerns itself with recurrent estimation of the 
current state of the hidden signal process given the observation data until the present time. 
This is particularly useful in tracking problems where the estimation of the current location 
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of an object needs to be constantly updated as new noisy information flows in. On the other 
hand, optimal smoothing involves the class of methods which can be used in reconstruction 
of any past state of the signal process given a set of measurements up to the present time. 
More specihcally, given the signal process X and the observation process Y, filtering theory 
entails computation of the conditional expectations of the form E[(/>(Xt)|Jy^], where {J -^} 
denotes the filtration generated by the process Y. The cr-algebra contains all the 
information about the observation process Y up to the present time t. Smoothing, however, 
involves evaluation of the conditional expectations of the form E [(/)(X 5 )| , where s < t. 

The smoothing techniques can also be viewed as tools of estimation of the current state 
given a data set which includes future observations. This interpretation is particularly 
relevant in statistics, where such techniques are essentially the means of computing certain 
posterior conditional densities given the observation set. The present article focusses on a 
variational approach to this smoothing problem and later employs the method for estimation 
of parameters of diffusion processes. 

Evaluation of such conditional expectations or densities are quite difficult, since they 
are often solutions of suitable (stochastic) partial differential equations. These are usu¬ 
ally infinite-dimensional problems and analytical solutions are generally impossible. Hence, 
effort has been directed toward developing of a variety of numerical schemes for efficient 
approximation of these conditional densities. While Markov chain Monte Carlo methods for 
inference use discretization of the given SDE for writing down an approximate likelihood 
Kushner and Dupuis (2001); Pages and Pham (2005); Andrieu et al. (2010), particle meth¬ 
ods approximate the (posterior) conditional densities by suitably weighted point masses 
Crisan and Lyons (1999); Del Moral et al. (2001); Bain and Crisan (2009). However, these 
methods often rely on a suitable discretization of the problem which is mostly done in an 
ad-hoc way. Since a theoretical framework for obtaining approximations is not present, the 
approximation error might be difficult to quantify. 

In contrast, the present paper focusses on a variational approach to this estimation 
problem. The main idea in such a method is to approximate the (posterior) conditional 
probability distribution of the system’s state (given the observed data) by an appropri¬ 
ate Gaussian distribution, where the optimal parameters for the Gaussian distribution are 
obtained by minimizing the relative entropy (or Kullback-Leibler distance) between the 
posterior process and a suitable approximating SDE. Earlier works like Archambeau et al. 
(2007, 2008); Archambeau and Opper (2011); Cseke et al. (2013); Vrettas et al. (2015) 
considered the case when the signal process is modeled by an SDE with a constant diffu¬ 
sion term. The advantage of working with a constant diffusion term is that it implies that 
the approximating SDE will simply have a linear drift so that marginals are distributed 
as Gaussian. This simple expression of the SDE with a linear drift makes the subsequent 
optimization problem for finding the suitable parameters for this approximating SDE eas¬ 
ier. However, since most physical phenomena cannot be realistically modeled by SDEs with 
constant diffusion term, there is a pressing need of extending the approach to general SDEs. 
One natural but naive approach in this regard could be to freeze the diffusion term at an 
appropriate value, that is, to take the zeroth order expansion of the diffusion coefficient. 
Although simple to implement, the efficacy of the method is not guaranteed by theoreti¬ 
cal results and will vary from case to case, and a reasonable error analysis might require 
unreasonably restrictive conditions on the model. 
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Instead, the present article delves much deeper in to the problem and develops methods 
for finding the optimal approximating SDE such that the relative entropy between it and 
the true posterior process is minimized subject to the condition that the marginals of the 
former follow Gaussian distributions. The main obstacle that needs to be overcome in this 
approach stems from the fact that unlike the previous case, the approximating SDE here 
cannot be taken to be the one with a linear drift; and a suitable expression of it needs to 
be found so that the marginals are still Gaussian. This has been achieved in Theorem 9. In 
fact, our work outlines the most general techniques for approximating the posterior density 
by any density from the exponential family or mixture of exponential families. In this 
connection we would like to note that the reason for requiring that the marginals follow a 
Gaussian distribution or more generally, a distribution from the exponential family because 
this results in a finite-dimensional smoother which can be used for approximating a wide 
range of distributions. 

It should be noted that the variational method considered here is different from the 
so-called extended Kalman filter (EKE) in two ways; first, EKE is employed for filtering 
problems; but more importantly, EKE starts by linearizing the signal (prior) SDE and then 
freezing its diffusion term, while the variational approach is concerned with approximation 
of the posterior SDE. Therefore even though in the constant diffusion term case, the ap¬ 
proximating SDE happens to have linear drift and thus resulting in a Gaussian smoother, it 
is not based on the same philosophy behind the EKE. And as mentioned before, in the non¬ 
constant diffusion term case although our method can be used to obtain a finite-dimensional 
smoother, in particular, a Gaussian smoother, it completely avoids any form of linearization 
of the given SDE or subsequent freezing of the diffusion term. 

In our paper this variational approximation method has been formulated as an optimal 
control problem. The advantage of this theoretical framework is that necessary conditions 
for global optimality are then obtained by employing the Pontryagin maximum principle. 
This leads to considerable computational advantages of the variational method compared to 
numerically solving the underlying (stochastic) PDEs, that is highlighted by two examples. 

The later part of the paper focusses on the important topic of parameter inference of 
SDEs. The above scheme of estimating the hidden states and the smoothing densities is 
cleverly used in designing an efficient method for estimating parameters of SDEs. In particu¬ 
lar, the paper proposes an iterative EM-type algorithm which aims to compute approximate 
maximum likelihood estimates of the parameters in a tractable way. Two illustrative exam¬ 
ples, which are important in mathematical finance, demonstrate the accuracy and efficiency 
of the proposed algorithms. Euture projects will address more complicated models. 

The layout of this article is as follows: In Section 2 we formally introduce the problem 
setting. We consider as a running example throughout the manuscript a geometric Brownian 
motion. The variational approximation idea is motivated in Section 3 leading to a specific 
class of optimization problems that is addressed in Section 4, It is then reformulated 
in Section 5 as an optimal control problem and necessary conditions for optimality are 
derived. Section 6 explains how the variational approximation can be used to infer unknown 
parameters of the model. Section 7 discusses the presented variational approximation in 
the context of a discrete time measurement model. The theoretical results are applied in 
Section 8 to two examples: a geometric Brownian motion and to the Cox-Ingersoll-Ross 
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process. We finally conclude with some remarks and directions for future work in Section 9, 
Certain technical proofs are relegated to the appendix. 

Notation. Hereafter, I„ is the n-dimensional identity matrix and Ej is the n x n matrix 
where the ii-lh entry is one and zero elsewhere. We let Sym(n,M) and GL(n, M) be respec¬ 
tively the set of symmetric and invertible n x n matrices with real entries. For matrices 
H, H G let := tr(H^H) denote the Frobenius inner product. For a vector 

b G M” and a positive definite matrix A, we employ the norm ||6||^ := A~^b. We define 

the standard n—simplex as A„ := {x G M"" : x > 0, Xi = 1}. Let C := (^([O, T], M”) de¬ 
note the space of continuous functions on [0, T] taking values in MA. Let 5 be a metric space, 
equipped with its Borel a-field B{S). The space of all probability measures on {S,B{S)) is 
denoted by V{S). The relative entropy (or Kullback-Leibler divergence) between any two 
probability measures HjU £ 'P(5) is defined as 


D(hlk) 


-|-oo, otherwise. 


where <C denotes absolute continuity of measures and ^ is the Radon-Nikodym derivative. 
By convention measurable means Borel-measurable in the sequel. Given an S'-valued random 
variable X with Law(A) = ^ G 'P{S), let denote the expectation of X. 


2. Model setup 

As usual, we will work on a complete probability space (H,T', P) equipped with a filtration 
{Xt} satisfying the usual conditions, that is, {Ft} is complete, right continuous and contains 
all the P-null sets. The basic objects in our study consist of a signal process X and an 
observation process Y, both of which are assumed to be {Ft}- adapted. The unobserved 
signal process X is modeled by the following stochastic differential equation describing the 
state evolution of a dynamical system: 

dXt = f{Xt)dt + a{Xt)dWt, Xo = xo, 0<t<T, (1) 

where / : M” —)• MA, a : M” —)• and W is an n-dimensional Brownian motion 

independent of xq. The observation process Y is modeled as noisy measurements of some 
function of the signal process X. Mathematically, Y is defined as 

Yt= f\{Xs)ds + Bt, (2) 

Jo 

where h : MA —)• MX is called the observation function and B is an m-dimensional Brownian 
motion independent of xq and W. 

Assumption 1 We stipulate that 

(i) / and a are globally Lipschitz; 

(ii) and h is twice continuously differentiable. 
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It is known Kallenberg (2002) that under Assumption 1 there exists a unique strong solution 
to the SDE (1). Given the observed data up to some time T, {y^ : s < T}, the goal of the 
paper is to outline an approximation method for the smoothing density, Vs{x,t), which is 
the conditional probability density of Xt given {1^ : s < T}. In other words, the smoothing 
density is defined by the equation: 

= I ^{x)Vs{x,t) dx, (3) 

up to a.s. equivalence, where (j) is any bounded measurable function from to M and 
} denotes the filtration generated by the process Y. 

More generally, we will be interested in approximating the full conditional probability 
measure on the path space, C = ^^([O, T], M”'). To describe this mathematically, assume that 
a regular conditional probability measure is chosen. Then there exists a measurable 

probability kernel y £ C ^ npost(-,y) £V{C) such that for any measurable set M C C, 

^^[^[o.T] £ = npost(A, Y[o^r]). 

Given the observation process up to time T, y[o,r] > we now describe a characterization of 
the probability measure npost(u Ejg'pj), which will play a pivotal role for our purposes. The 
probability measure npost(u b^[o,r]) is actually the distribution of a diffusion process X'^ on 
C, and the latter is obtained by a modification of the original signal process X: 

dXf = g{Xj, t)dt + a{Xj)dWu ^ = xq, (4) 

where W is an {J^t}-adapted Brownian motion that is independent of Y. Notice that the 
diffusion coefficient of the above SDE (which we will henceforth call the posterior SDE or 
posterior diffusion) is same as that of the original SDE, and the drift of this posterior SDE 
is time-dependent and is obtained as 

g{x,t) ■.= f{x)Xa{x)X log w{x,t), (5) 

where a{x) := cj(x)(t(x)T We give details about the (random) function w a little later, but 
the important point to note here is that the new drift function is the old drift function with 
an extra additive term, and the observation process y[o,T] enters into the characterization 
of npost(-, 4^[o,r]) only through w. 

To see this characterization of npost(', 4 )o,t])) we first look at the usual filtering density 
VF{x,t), which is naturally defined by 

E[(j){Xt)\X^] = j (l){x)'PF{x,t) dx. (6) 

Under suitable technical conditions, the filter density Vf satisfies the Kushner-Stratonovich 
equation (for example, see Stratonovich (I960); Kushner (1967); Bain and Crisan (2009)). 
Eor our purposes, however, it is convenient to work with the unnormalized filter density 
p{x,t), that is, VF{x,t) = p{x,t){J^n p{x,t)dx)~^, which satisfies the so-called Zakai equa¬ 
tion Zakai (1969) 

f dp{x, t) = A*p{x, t)dt + p{x, t)h{xydYt 
1 p{x,0) = Po{x). 
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Here po denotes the density of xq and A* is the adjoint of the infinitesimal generator of the 
process X given by Ai^ix) = Y,i + 5 Y.i,j for V' G C^{W^,R). 

We next consider the backward stochastic partial differential equation (SPDE) 

{ dw{x,t) =—Aw{x,t)<it — w{x,t)h{xydYt 
w{x,T) = 1. 


Conditions about existence of solutions to (7) and ( 8 ) can be found in Pardoux (1981/82). 
It is well known (Pardoux, 1981/82, Corollary 3.8) that the smoothing density can be 
expressed as 


'Ps{x,t) 


p{x, t)w{x, t) 
/g„p(x,t)rc(x,t)dx’ 


(9) 


Now by using (7), ( 8 ) and (9), it can be shown-^ that the smoothing density solves the 
following Kolmogorov forward equation 


I + E - 5 E = “• (“) 

\ * i,j / 

with the drift term g defined by (5). In other words, the conditional probability measure 
HpostCu E[o^r]) on C is induced by the diffusion process as defined in (4). 

Evaluating npost(') is what is known as the path estimation problem. Except for 

a few simple cases, the SPDEs, that are involved in this estimation of the hidden path, are 
analytically intractable. The variational approach that we undertake in this paper actually 
has the goal of approximating npost(-, E[o^'r])- Toward this end, a natural objective is to 
approximate npost(u by a probability measure such that the corresponding marginals 

of the latter come from a known family of distributions (e.g, exponential family). As a 
result, the marginal of this approximating probability measure at time t approximates the 
smoothing density Vs{xA)- The procedure adopted in this article involves finding the 
optimal parameters of this approximating distribution by minimizing the relative entropy 
between the posterior distribution and the approximating one. 


2.1 Example: Geometric Brownian Motion 

We present as a running example throughout the article the geometric Brownian motion 
that is used to model stock prices in the Black-Scholes model, see Shiryaev (1999). The 
system dynamics (1) is given by a one-dimensional geometric Brownian motion 

dXt = nXtdt + XXtdWt, Xq = xq ~ logX(/U, a), ( 11 ) 

for 0 < t < T, A, K > 0 and an observation process ( 2 ) defined by 

Yt= f Xsds + Bt. (12) 

Jo 

It is straightforward to see that Assumption 1 holds in this setting. 

1. See Appendix A for a detailed derivation. 
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3. Variational approximation: Motivation 

Let Hprior denote the distribution of the original signal process X on C, that is, for a 
measurable A <Z C, nprior(^) = P[V[o^ 7 ’] G vl]. Define the two terms 

i/r(V[o,T],y) := -KXT)yT+ r VsdhiXs) + \ C \\h{Xs)fds (13) 

Jo ^ Jo 

HHT{-,y)) ■■= -log^y exp(-iLr(-,y))dnprioy . (14) 

Let y be a sample path of the observation process Y on the interval [0, T], Then notice that 
by the pathwise Kallianpur-Striebel formula (or the Bayes formula), we have 

dBpost(•,!/) ^ exp{-HT{-,y)) ^ exp(-iLr(-, y)) 

dllprior J" GXp( — iLj’(•, y))dllpj-ior 

where L{y) = f exp(—i7r(-, y))dnprior- Consequently, L(y) can be interpreted naturally 
as the likelihood of the path y, or equivalently, I{HT{-,y)) is viewed as the negative log- 
likelihood of the sample path y. The term HT{X^Qrp^,y) can be interpreted as the X- 
conditional information and the information in the observation that Y = y, see Mitter and 
Newton (2003) for more details. Now for any probability measure Q'^ on (^([O,r],M), the 
relative entropy between Q and npost(uy) can be expressed by the following lemma. 


Lemma 2 D((5||npost(-, y)) =-/(iLT(-, y)) + D(Q||nprior) +EQ[iLT(-,y) • 


Proof. The proof essentially follows the one in (van Handel, 2007, Lemma 2.2.1). Splitting 
the relative entropy and using the pathwise Kallianpur-Striebel formula yields 


□(gllHprior) = J 
= D(g||npost 
= D(g||npost 
= D(g||npost 


log 


dg 


dnpost(‘, y) 


+ log 


f dnpost(uy) 


I, 


dn 


prior 


dg 


dnpost(') y) 


dn 


dg 


prior 

exp(-iLT(-,y)) 


(uy)) + J log 

( ,y)) +\^J-exp(-i7T(-,y))dnpnor 

(•ly)) -IEQ[i7r(-,y)] - log (^j exp(-iLr(-,y))dn 


dg 


prior 


Mitter and Newton Mitter and Newton (2003) provide an information-theoretic interpre¬ 
tation to this result. They interpret the term (14) as the total information available to 
the estimator Q through the sample path y. On the other hand, they call the quantity 
F{Q,y) := D(g||nprior) -|-Eg[L7'r(-, y)] the apparent information of the estimator. By 
non-negativity of the relative entropy iF{Q,y) > I{HT{-,y)) with equality if and only if 
g = npost(uy)- In this sense, a suboptimal estimator appears to have access to more 
information than is actually available. 

2. Q will be called the approximating probability measure in the sequel. 
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Since the total information I{HT{-,y)) does not depend on Q, minimizing the relative 
entropy between Q and npost(')y) over a class of probability measures Q is equivalent to 
minimizing the apparent information J^{Q, y). This motivates to consider an approximating 
distribution Q on C that is characterized as the solution to the following optimization 
problem: 

Problem 3 Minimize D((5|jllprior) +'^Q[HT{-,y)] subject to 
(i) Q is a probability distribution induced by an SDE of the form 

dZt = u{Zt,t)<lt + (T{Zt)dWt, Zq = xo, 0<t<T; (15) 

fiij The marginals of Q at time t, i.e., the distribution of Zt, belong to a chosen family of 
distributions. 

We will show in the remainder of this article how Problem 3 can be restated as an optimal 
control problem, which leads to a standard formulation of necessary optimality conditions 
in terms of Pontryagin’s maximum principle. 

Note that the objective function of Problem 3 is known to be strictly convex with re¬ 
spect to Q, see Csiszar (1975). The constraint (ii) restricts the feasible set approximating 
distributions Q to a nonconvex set. Note that such problems (i.e., absence of constraint (i) 
have been studied in the literature Pinski et al. (2015)). In our setting, the set of feasible 
solutions is also coupled with the first constraint (i), that parametrizes the feasible set of 
distributions in terms of the drift function u. This coupling is investigated in Section 4, in 
particular Theorem 9 characterizes the set of all drift terms u such that the distribution 
induced by (15) has finite dimensional marginals that belong to a given family of distri¬ 
butions. Hence, Problem 3 can alternatively be interpreted as minimizing the objective 
function over a class of drift functions u that induce Q via (15) and such that Q satisfies 
constraint (ii). For example, if the goal is to approximate the posterior distribution Upost 
by a distribution Q whose marginals are normal distributions, then one aims to find a drift 
term u such that the objective function is minimized and such that the solution Zt to (15) 
admits a normal distribution. 

Remark 4 Notice that the unconstrained optimization of the objective function in Problem 
3 with respect to Q will simply yield the minimizer Q to be Hpost- Since, as discussed in 
the beginning of Section 2, Hpost is induced by the SDE, (4), the constraint (i) in Problem 
3 is essentially inbuilt. In other words, it is the constraint {ii) which plays the crucial role 
in the methods outlined in this paper. 

The objective function in Problem 3, in particular the relative entropy between the approx¬ 
imating distribution Q and the prior distribution Hprior can be simplified, since due to the 
constraint (i) the underlying SDEs (15) and (1) share the same diffusion coefficient. In view 
of (15) and (1), consider two SDEs for 0 < t < T 

dXt = f{Xt)dt + a{Xt)dWt, dZt = u{Zt, t)dt + aiZt)dWt, Xq = Zo = xq, 

with u : M” x M —)■ MT, f : M” —)■ M”, a : M” —)• W an n-dimensional Brownian motion 

independent of xq and both SDEs satisfying Assumption 1, Let (D,Tr, P) be a probability 
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space, where Ft is the sigma algebra a{Ws ■ s < T) and let Ilprior and Q denote the the 
laws of Xt and Zt with respect to P. It follows by Girsanov’s Theorem 0ksendal (2003), 


that 


E 


Q 


log 


dg 


dn 


prior / J 


2^Q 


(p{s, u’)^(p(s, uj)ds 


where (p(s,uj) := o'(Zs(uj)) ^ (u(Xs(uj)) — f(Xs(uj))) . Therefore, the relative entropy be¬ 
tween g and Ilprior is 


D(giinprior) = 


where \\u{x, s) — := (tt(x, s) —/(x))^ a(x) ^ {u{x, s) — f{x)). Hence, the objec¬ 

tive function in Problem 3 can be expressed as 


D(g||nprior)+EQ[i7r(-,y)] 

T 


= Teq 

Jo 


2 M^t,t)-fiXt)ra^^^) + yt[u{Xt,tyvhiXt) 

dt - 2/tEq[/i(X7’)] , 


(16) 


+ ^aiXtVV^h{Xt)a{Xt)] + ^||/i(Xt)f 


where the last equality is due to Fubini’s Theorem and Ito’s Lemma. The two coupling 
constraints (i) and (ii) in Problem 3 are studied in the next section and will finally allow 
us to reformulated Problem 3 as an optimal control problem. 


4. Multi-dimensional SDE with prescribed marginal law 

This section establishes conditions on the drift function in the approximate SDE (15) such 
that the induced marginal distributions evolve in a given exponential family. 

Definition 5 (Exponential family) LetTJi,... ,'Hm be Hilbert spaces and letH = 
be endowed with the inner product {■,■)■ Let the funetions Ci : —)• Hi for i = 1,... ,m be 

linearly independent, have at most polynomial growth, be twice eontinuously differentiable 
and denote c(x) = (ci(x ),... ,Cm(x)). Assume that the convex set 
P := {0 G H : i/^(0) = log f exp (^0,c(x))) dx < oo} has non-empty interior. Then 

EM{c) = {pf, 0), 0 G A}, p{x, 0) := exp ((0, c(x)) - -0(0)) , 

where A C T is open, is called an exponential family of probability densities. 

Definition 6 (Mixture of exponential families) Let EM{c^'^">) for i = 1,... ,k he expo¬ 
nential families according to Definition 5. Then 

( ^ 

EM(c«,...,c('^))= J]u,p<,(-,0W) : p^(-,0W) G EM(cW), u G 

e=i 

is called a mixture of k exponential families of probability densities. 
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Consider the stochastic differential equation (15), where r : M"" x M —)■ M”, a : M” —)■ 
and PC is a d-dimensional Brownian motion independent of xq. 


Assumption 7 

1. The SDE (15) satisfies Assumption 1. 

2. The initial condition xq has a density po that is absolutely continuous with respect to 
the Lebesgue measure and has finite moments of any order. 

3. The unique solution Xt to (15) admits a density p{x,t) that is absolutely continuous 
with respect to the Lebesgue measure and that satisfies the Kolmogorov forward 
equation. 

Problem 8 Let ..., be a mixture of exponential families, let po be a density 

eontained in let a be a diffusion term and let a(-) := Let 

U{xo,a) denote the set of all drifts u sueh that xo,u,a and its related SDE (15) satisfy As¬ 
sumption 7. Assume U{xq, a) to be non-empty. Then given a eurve 1 1 —)■ p{-, ..., 

in EM{c^^l ,..., find a drift inUfxQ, a) whose related SDE has a solution with marginal 
density p{-,e^^\...,QP). 


A solution to Problem 8 is given by the following theorem. 


Theorem 9 Given the assumptions and notation of Problem 8. Consider the SDE (15) 
with drift term 


1 n ^ 1 n 

{x, i) = 2 + 2 ^ 


gl-p(x,0r,...,0r) 


.(1) 




1=1 


i=i 

k 




1 


for i = 1,... ,n, where 

Xf\x) := J d^i, (17) 

{xi-,^i) := {xi,... ,Xi-i,f,i,Xi^i,... ,XnY and the functions : M” x TL ^ TL for all 
£ = 1,..., fe satisfy 

n 

E = (0f\cW(x) - ))• (18) 


i=l 


If u G U{xo,a), then the SDE (15) solves Problem 8, i.e., Xt has a density 

k 


PXtix) = E^^exp K 0f\c(^)(x)\ - V’K0f^)) > for all t 


< T. 


e=i 
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The proof is provided in Appendix B, 

Remark 10 1. For the non-mixture and one-dimensional case (A: = n = 1), the result 

is known Brigo (2000) and coincides with Theorem 9, Furthermore, it can be seen by 
the proof in Brigo (2000) and by invoking the existence and uniqueness theorem for 
ODEs, that the drift function u is uniquely determined. 

2. For the multi-dimensional case (n > 1), the drift function is not unique anymore, as 

there exist multiple choices for This gives rise to a natural question, if there 

(t) 

exist a particular choice of ' such that the integral terms XI in (17) admit closed- 
form expressions. In Section 4.1 (Proposition 11), we derive such functions for 
the mixture of multivariate normal densities. 

3. In a non-mixture setting [k = 1), the drift function simplifies to 

j=i ^ j=i ' ^ ' 

- (®t, J (^i((x_i,^i),0t)exp [(0t,c(x_i,^i) -c{x))] dCi^ , 

where the functions ipi have to satisfy (18). 

As remarked, the drift term proposed in Theorem 9 consists of the integral terms (17), 
that depend on the particular exponential families considered. In the following, we restrict 
ourselves to the mixture of multivariate normal densities and show that these integral terms, 
and hence the drift function, admit a closed-form expression. 


4.1 Mixture of multivariate normal densities 

Consider the family of multivariate Gaussian distributions with mean m G and covari¬ 
ance matrix S G Sym(n,M), that can be expressed in terms of Definition 5 as follows. Let 
the Hilbert space Ti = M” x be endowed with the inner product (^{a, A), (b, B)'j = 

a^b + tr{A^B) and define 


0 = (r/, 9) := ^m, ^ c : M"" —)■ B, c{x) = (x, xx^) 

if; -0(0) = + ^logdet + ^log(27r). 

A direct computation, using tr{r]r]^6~^) = r]^9~^ri, leads to 

p{x,e) = exp ((c(x),0) -iA(0)) =-- Ye^v(-lix-myS~^{x- 

(27r)2(det5)2 V 2 


(19) 



3. For example, := Sij{c^^\x) — VeV><(0)^^)) for all j e {1,... ,n} are feasible choices for (pl^\ 

as they satisfy (18). 
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We point out again that for the proposed variational method, it is favourable if the approx¬ 
imating SDE (15) has a drift function that admits a closed-form expression. Furthermore, 
since the drift function is not unique (cf. Remark 10), among all feasible solutions char- 
acterized by the ' functions, we want to find one that can be computed analytically. 
The latter turns out to be a difficult task and depending heavily on the specific exponential 
familiy chosen. From now on, we consider the exponential family of the multivariate normal 
probability densities that is given by (19). In this setting, it is possible to find functions 
ipl ' such that the integral terms (17), and therefore the drift function, can be computed in 
closed form. 


Proposition 11 For the mixture of multivariate normal densities, one possible choice for 
the drift function proposed by Theorem 9 is 


1 


u{x,t) = -div(a(x))-|- 


p(x,0r\...,0f^)V4 * * ‘ 

- + a(x) ^ . 

The proof is provided in Appendix C. 

Remark 12 For the non-mixture setting the drift term simplifies to 

u{x,t) = ^div(a(x)) -h + a(x) Q??* -k , 

that in the special case of a constant diffusion term is a linear function, as one would expect. 


We introduce the following ansatz for the drift function 


u{x, t) = -div(a(x)) -k 




A 




+Bj^^x+a(x) xj^ 


p(x,ei^\...,er) 


(k)) 


( 20 ) 


where G 


and cannot be chosen arbitrarily. They are coupled according to Proposition 11, 
By comparing the coefficients of Proposition 11 and (20) one gets 




and G 


for all £ = 1,... k. The coefficients a[^\ b[^\ 


A 


1 (€) 




vr - ft 


1 


m 


d(^) _ 


1 


<>!■'. = Dk'=«k', 


Hence, one directly sees that the four parameters A^^\ Bf, cf and for all £ = 1,... ,k 
are coupled via the two ODFs 


dC, 


(^) 


= -Hf - B, 


dt 




dZl 




t ’ 


dt 


= -2Df'^Bf\ 


( 21 ) 


Note that the parametrization introduced in (20) provides relatively simple expression for 
the mean and variance of the variational approximation derived in the next section (Sec¬ 
tion 4.2). In the authors’ opinion this parametrization therefore helps to keep the notation 
simple. 
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4.2 Equations for mean and variance 

Theorem 9 provides an explicit formula for the drift term in the approximating SDE (15), 
that simplifies to (20) in the case of multi-normal marginal densities. Therefore, the mean 
and variance of the approximating SDE (15) are characterized via the following two ODEs. 

Theorem 13 Consider the SDE (15) with drift term u given by (20), such that the solution 
Xt has a marginal density p{x, ..., G EM{ci ,..., Ck) that is an arbitrary convex 
eombination of densities pe{x, G EM{ci) for £ = 1,... ,k. Let and denote the 
mean and varianee of Xt with respect to pe{x,Qf^^). Then, 

and 

1 Xi) 

= 2E^|xdiv(a(X))"] + 2Ep^[div(a(X)) X"] - 2mf)E^^[div(a(X)) ]" 

- ^Ep^[div(a(X)) ] " + Ep^[a(X)] + 5^5^" + 

+ ¥.^[xcf'>\{X)] + Ep^[a(X)C'f X"] - ¥.^^[a{X)] (23) 

- Ep^[a(X)] Cf" + Ep^JxX^D^a(X)] + Ep^[a(X)Df XX"' 
-mf\\x^Df\{X)\ -Ep^[a(X)Dfx] 

The proof is provided in Appendix D. Note that given and the mean and variance 
of Xt can be expressed as mt = X)£=i and St = + Yl\=i — 

(ELi respectively. 

Remark 14 If the coefficients Vi in the convex combination of the marginal density 
p(x, ..., 0j^^) in Theorem 13 are fixed a priori, the 0DEs (22) and (23) are only 
sufficient for describing and s\^\ Oftentimes, however, one is interested in choosing 
those coefficients a posteriori, for example by solving an auxiliary optimization problem. In 
such a setting the ODEs given by Theorem 13 are necessary and sufficient. 

We have studied how to reformulate the constraints (i) and (ii) of Problem 3 by deriv¬ 
ing an expression for the drift term to the approximating SDE (15). In the case that the 
marginals in (ii) are restricted to a mixture of multivariate normal densities this reformu¬ 
lation reduces to the ODEs (21), (22) and (23). 

4.3 Example: Geometric Brownian Motion 

We continue the geometric Brownian motion example started in Section 2.1, The goal 
is to approximate the smoothing density by a normal density. Therefore, according to 
Proposition 11, the drift function for the approximating SDE (15) has to be chosen as 

u{x, t) = At + (A^ Bt)x + X^x‘^(Ct + Dtx), (24) 
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where the coefficients At, Dt are coupled via the two ODEs (21). This choice of drift 

function leads to ODEs for the mean and the variance of the posterior process, according 
to Theorem 13 

-\- At -\- Btirit + y?Ct{jn^ + St) + )?Dtijn^ + SnitSt) ( 25 ) 

1 Q 

= X^m^t + 35t) + 2BtSt + AX^CmtSt + QX^Dtim^St + Sf). (26) 

5. Optimal control problem formulation 

In this section, we show that the optimization problem 3, using the results derived from 
Theorem 9, can be reformulated as a standard optimal control problem (OCP), which 
conceptually is similar to Mitter and Newton (2003)^. Therefore, the presented variational 
approximation method to the path estimation problem for SDEs can be expressed as an 
OCP and as such leads to a standard formulation of necessary global optimality conditions 
in terms of Pontryagin’s maximum principle. Consider the vector spaces V := M” x 
Z := X Sym(n,M) x x Sym(n,M) and define the trajectories 

[0,r] 9 t ^ v^^\t) := {Af\Bf^) G V 

[0,r] 9 t ^ zW(t) := G i, 

for £ = 1,..., /c. We introduce the state variable z{t) := ( 2 ;^^^(t),..., z^^\t)) G 0^=1 Z =-. Z 
and the control variable v{t) := (t ),..., (t)) G Y^t=i V =: V for t G [0, T]. As a first 

step, in view of the cost functional (16) of Problem 3, the so-called Lagrangian 

Eg \h\u{Xt,t)-f{Xt)\\l^^^) 

f 1 \ 1 1 

+ yt UiXt,t)^VhiXt) + ^a{Xt)^X^h{Xt)a{Xt) \ + ^\\h{Xt)f 

is expressed as a function of only z{t),v{t) and t. This step, while being exact in some 
cases, may require an approximation. In the case that the marginals of Q are mixtures of 
normal densities, the expectation of any polynomial in Xt can be expressed as a function of 
its mean and variance. If the diffusion term ct is a polynomial, and no mixture is considered 
(A: = 1), the drift function u, according to (20), is a polynomial. We refer to Section 8 to 
see how the Lagrangian can be derived for two concrete examples. Consider a Lagrangian 

L : [0,T] X Z X V ^ M., L{t, z{t),v{t)) ^ (27), 

where ~ indicates that in order to express the term (27) by the state and control variables 
only, an approximation might be needed, as explained above. Similarly to the Lagrangian, 
in view of the cost functional (16), we introduce a terminal cost F : .Z —)■ M by 

FiziT)) PS -yTEQ[h{XT)] . 

4. Note that Mitter and Newton (2003) addresses a related problem, whose main difference, when compared 
to the presented method, is that the variational characterization considered there is exact. 
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Under the assumption that the drift term cj is a polynomial, the ODEs derived in the 
previous section can be expressed in standard form. We dehne the function H : Z xV ^ Z 
by 

where 


dru- 




dt 


d5, 




dt 


d 

-^(t) = {z{t),v{t )), 


da 


a 


dt 


dD 




dt 


d 


for i = 1,... ,k are given by (22), (23) and (21). Thus, we have shown so far in this article 
that Problem 3 can be reformulated as the following optimal control problem 


minimize 

J{v) 

II 

z{t),v{t))dt + F{z{T)) 

v&M([0,T],V) 

subject to 

z{t) 

II 

v{t)), t G [0,T] a.e. 


z{0) 

= Zo, 



(28) 


where M{[0,T],V) denotes the space of measurable functions from [0,T] to V. It remains 
to discuss how to find the initial condition zq in the OOP (28). A straightforward, however, 
clearly not efficient, method for that is solving the Pardoux equation (8), which according 
to (9) provides the smoothing density at initial time as 'Ps{x,0) = ^ from 

where zq can be derived. 

5.1 Maximum principle 

We derive necessary conditions for global optimality of the optimization problem (28) that 
are provided by the Pontryagin maximum principle (PMP). Since the control set V is un¬ 
bounded, we need an extended setting of the standard PMP, see (Clarke, 2013, Section 22.4) 
for a comprehensive survey. It requires some further assumptions. 

Assumption 15 Let the process (2:*(t), u*(t))^g[o,r] be a local minimizer for the OCP (28), 
that satishes 

(i) The function F is continuously differentiable; 

(ii) The functions H and L are continuous and admit derivatives relative to 2 : which are 
themselves continuous in all variables {t,z,v)] 

(iii) There exist e > 0, a constant c, and a summable function d such that for almost every 
t £ [0, T], we have 

\z - z*{t)\ <£ ^ \Vz{H,L){t,z,v*{t))\ < c\{H,L){t,z,v*{t))\ + d{t). 
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Note that Assumption 15(iii) is implied if 

\V:,H{t,z,v)\ + \ VzL{t,z,v)\ < c{\H{t,z,v)\ + \L{t,z,v)\) + d{t) 

holds for all u G V when 2: is restricted to a bounded set, which is satisfied by many systems. 
Moreover, the condition automatically holds if v* happens to be bounded. 

Lemma 16 (PMP (Clarke, 2013, Theorem 22.2)) Given Assumption 15, let the pro- 
eess (-2*(t),^^*(t))te[o,r] be a local minimizer for the problem (28). Then there exists an 
absolutely continuous function p : [0, T] Z satisfying 

1. the adjoint equation p{t) = —Vz {p{t), H{z*{t),v*{t)))—VzL{t, z*{t),v*{t)) for almost 
every t G [0, T]; 

2. the transversality condition p{T) = VzF{z{T)); 

3. the maximum condition 

{p{t),H{z*{t),v*{f))) + L{t,z*{t),v*{f)) = mi {p{t),H{z*{f),v)) + L{t,z*{f),v) for 

pG V 

almost every t G [0,r]. 

Remark 17 1. Given that an optimal process {z*,v*) exists^, the maximum condition 

3 can be used to derive a feedback law 

v*{t) G a.Tgmm{p{t),H{z*{t),v)) + L{t,z*{t),v). 
pG V 


2 . Lemma 16, basically leads to a boundary value problem with initial conditions for 
the states and terminal conditions for the adjoint states, that provides necessary 
conditions for global optimality of Problem 3. 

We summarize the described method to approximate the smoothing density introduced 
so far. It basically consists of the following three steps, that provide a solution to Problem 3: 

Step 1 Fix a mixture of exponential families of probability densities, e.g., the mixture of 
multivariate normal densities. Theorem 9, that simplifies to Proposition 11 for the 
multivariate normal densities, characterizes the approximate posterior SDE (15) whose 
solution admits marginal densities evolving in the chosen mixture of exponential fam¬ 
ilies. 

Step 2 Given the approximate posterior SDE (15), we derive an optimal control formulation 
of Problem 3. For the mixture of multivariate normal densities, this derivation is 
presented in Sections 4 and 5 and finally leads to the OGP (28). 

Step 3 Necessary conditions for optimality of the OGP (28), and hence for Problem 3, can 
be derived from Pontryagin’s maximum principle and result in a structured boundary 
value problem. 

5. Existence of an optimal process can be assured by standard existence results, see for example (Clarke 
2013, Theorem 23.11). 
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Remark 18 It is important to note that the presented method chooses the best approxi¬ 
mating SDE in a desired class using an objective distance measure between the correspond¬ 
ing probability distributions. One crucial advantage of this approach is that this distance 
could be quantified and numerically calculated (note that the first term in Lemma 2 can 
be directly computed and the remaining two terms form the objective function of the op¬ 
timal control problem considered), and hence the user gets an excellent estimate on the 
necessary approximating error. For instance, Figure Id and Figure 2d demonstrated the 
accuracy of corresponding approximating SDFs by plotting the relative entropies between 
the approximate models and the exact ones for the two examples considered in the paper. 

5.2 Computational complexity 

If the initial condition to the OCP (28) is known, the PMP, Lemma 16, reduces to a bound¬ 
ary value problem, that can usually be solved numerically more efficiently than (S)PDFs 
by using numerical methods specifically tailored to these problems, such as the shooting 
method, see Stoer and Bulirsch (2002). Therefore, the major computational difficulty of 
the presented variational approach lies in estimating the initial condition to the OCP (28), 
for example via estimating the smoothing density at initial time. A straightforward, how¬ 
ever clearly not efficient, method for that is solving the Pardoux equation (8), as explained 
in Section 2, which we used in the numerical examples in Section 8, As such, whereas 
the standard PDF approach for computing a smoothing density requires solving a Zakai 
equation and the Pardoux equation (8), the presented variational approach relies on only a 
Pardoux equation and the mentioned boundary value problem. This can usually be seen as 
a reduction in terms of computational effort required and is demonstrated by two numerical 
examples in Section 8, Table 1. Moreover, for future work, we aim to study the derivation of 
an estimator for the marginal smoothing density at terminal time without solving a Pardoux 
equation, that would then allow us to apply the proposed variational approximation method 
to high-dimensional problems, see Section 9 for more details. Another idea to circumvent 
the estimation of this mentioned terminal condition is to use an alternative approach to the 
PMP, for characterizing a solution to the OCP (28) that is briefly described in the following 
remark. 

Remark 19 (Semidefinite programming) Solutions to the OCP (28) can be charac¬ 
terized via the so-called weak formulation which consists of an infinite-dimensional linear 
program, see (Lasserre, 2010, Chapter 10) for details. Therefore, numerical approximation 
schemes to such infinite-dimensional linear programs, that have been studied in the liter¬ 
ature, can be employed to solve Problem 3, This approach seems particularly promising 
when the data of the OCP (dynamics and costs) are described by polynomials, as then 
the seminal Lasserre hierarchy based on solving a sequence of semidefinite programs, is 
applicable Lasserre (2001, 2010). 

5.3 Example: Geometric Brownian Motion 

We continue the geometric Brownian motion example started in Sections 2.1 and 4.3 and 
formulate the corresponding optimal control problem (28). Recall that the state variable is 
defined as z{t) := {mt, St, Ct, Dt) and the control variable as v{t) := {At, Bt). The ODEs for 
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the state variables are given by (21), (25) and (26). The objective function of the optimal 
control problem (28) can be expressed as F{x{T)) = —yTTriT and 


L{t,z{t),v{t)) = 






2X‘^{m‘f + St) ' X^mt ' 2X‘^ 

+ TTit {Cti}? -\- Bf — k) + AtDt + yti}? + Bt)) 

+ (mf + St) (-zX?C^ + Dt{X? + Bt — k) + — + X?ytCt 


+ {rrit + SmtSt) [X'^CtDt + X^ytDt) + {mj + Qrri^St + 3St)—D^, 


where, in order to derive the cost function above, the first two inverse moments of Xt with 
respect to Q have been approximated. Due to the non-negativity of the GBM, we use the 
approximation ps EQ[Xt] ^ and Eq[X^“^] ss EQ[Xt] ^ = {St + mt)~^, 

whose accuracy has been investigated in Garcia and Palacios (2001). 


6. Parameter inference 

The goal of this section is to outline the use of the techniques, developed so far for path 
estimation, for inference of parameters in a hidden Markov model. We consider a class of 
dynamical systems 

dX^ = f{X^,K)dt + a{X^,K)dWt, X^ = xo, 0<t<T, (29) 

parametrized by k. The observation process can be modeled by (2), but as discussed in the 
next section, the approach discussed below can also be used with necessary modifications 
for a discrete observation process. 

As a natural notation, for each parameter k, the probability distribution of X^ rp-^ on 
C will be denoted by npj,jQj,. Given a sample path {yt;0<t<T}of the observa¬ 
tion process T[o,t]) the objective is to select an optimal k* G such that the obser¬ 
vation process (Pj)te[o,r] ™ (2) has a high probability of reproducing the given data y. 
This is basically the inference scheme based on classical maximum likelihood estimation, 
and we propose an algorithm similar to the lines of expectation maximization (EM) algo¬ 
rithm (see Cappe et al. (2005) for a comprehensive survey), which aims to obtain the op¬ 
timal K* through multiple iterations. Recalling (14), for each k, we define M{HT{-,y)) ■= 
— log ^ f exp {—HT{-,y)) dnpj,;Qj,^ . As already noted in Section 3, for each parameter k, the 
term M{HT{-,y)) provides the total information available through the sample path y, and 
can be interpreted as the negative log-likelihood of y given the parameter n. However, min¬ 
imizing this negative log-likelihood function, even if numerical evaluation of it can be done, 
usually is a hard problem. But, as mentioned in Section 3, Lemma 2 and non-negativity 
of the relative entropy together imply that an upper bound to this negative log-likelihood 
term is given by the apparent information, X{Q, k) := |npj,jQj,^ +Kq[Ht{-, y)] ■ The ad¬ 
vantage of this observation is that this upper bound to the negative log-likelihood function 
is also the objective function in Problem 3, for which the program for finding the mini- 
mizer Q is by now well-established. Therefore instead of minimizing the actual negative 
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log-likelihood, we minimize an upper bound of it. The path to hnd the right parameter k 
corresponding to the sample path y is now quite standard in statistics. After initialization 
of the parameter k, we hnd the optimal Q by solving the Problem 3, and then in the sub¬ 
sequent step, for this Q we obtain the optimal parameter k by minimizing iF{Q,K). This 
yields an iterative EM-type algorithm whose details are given below. 


EM-type algorithm 


initialize 
while 
Step 1: 
Step 2: 

Step 3: 


i = 0, Ki := Ko 

i < M 

compute Qi by solving Problem 3 with parameter k* 
update parameter as Kj+i G argmin k) 

K, 

set i —)• i -|- 1 


Remark 20 Analyzing convergence of the above algorithm and consistency of the above 
corresponding estimator is the next important step and will be addressed in our future 
projects. 

We refer to Section 8 for a numerical visualization of this variational parameter inference 
method applied to two examples and to Section 9 for a discussion about convergence and 
consistency of the estimator as a topic of further research. 

7. Discrete time measurement model 

In most practical examples, the measurements of physical quantities are processed by com¬ 
puters, and as such the data available are obtained only at discrete times, potentially re¬ 
stricted to a low number. The goal of this section is to outline how the discussed variational 
approximation scheme adapts naturally to such cases with obvious modihcations. 

In this case the signal process (1) is observed through noisy measured data y := {yk}k=i 
at discrete times ti < t 2 < ■ ■ ■ < tjsf < T. The canonical model for the observation process 
is thus given by 

Yk = h{Xk,tk) + Pk, ior k = 1,... ,N, ( 30 ) 

where := Xt^, h : M” x M —)• MY is a measurable function, the pk are M”'-valued i.i.d. 
Gaussian random variables with zero mean and covariance Rk, and they are independent of 


xq and a(Ws ■ s < T). We consider 

m such that tm 

< t < tm+i and similarly to Section 2 

dehne the hlter density p and smoothing density Vs by 


E[<p{X{t))\Yi,. 

' • • ) Yjji, Xq] — J 

(j){x)p{x, t) dx 

(31) 

E[</>(A(t))|yi,. 

■ ■ ,Yn,xo] = 1 

(j){x)Vs{x,t) dx. 

(32) 
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where (p is any measurable function from to M. It is well known (see (Eyink, 2000, 
Appendix) for a derivation) that the smoothing can be expressed as 


Vs{x,t) 


p{x, t)w{x, t) 
J^nP{x,t)w{x,t)dx' 


(33) 


where p{x,t) and w{x,t) in between the observation times are the solutions to the 


\ dp{x,t) = A*p{x,t)dt 

Kolmogorov forward equation: ^ / n (34) 

P[x,0) = po{x), 


{ dw{x,t) =—Aw{x,t)dt 

(35) 

w{x,T) = 1, 

punctuated by jumps at the data points tfc for A: = 1,..., iV 

p{x,tt) oc p{x,tk) exp (^lR^^h{x,tk) - ^h{x,tkVRp^HxAk)^ (36) 

w{x,tk) oc w{x,tt)exp (^ylR~^h{x,tk) - ^h{x,tkVRp^HxRk)^ ■ (37) 

Similar to the continuous time measurement model, it can be shown that the smooth¬ 
ing density solves the Kolmogorov forward equation given by (10), with drift function 
g{x,t) '■= f{x) + a{x)'V log w{x,t), where w is the solution to (35). As before, we denote 
the prior probability measure by npi.ior(A) = P[A[o^r] ^ a-iid the posterior probabil¬ 
ity measure, induced by the solution to (10), by npost(^, E) = P[X[o^ 7 ’] G where 

= (t{xo, Ki, ..., Yn)- Let pk denote a realization of the observation process at the time 
tk- The variational approximation derived in Section 3, and, in particular, Problem 3 carries 
over to the discrete time observation setting considered here. As before, the path to the 
objective function starts from Lemma 2, which holds in this case with 

^ /I \ 

HT{X,y) -Y.UWRk^hiXuti)^ -yjRp^h{Xi,U)] . (38) 

i=l ^ ^ 

One way to see this is to recast the discrete model in the traditional setup of Section 2, 
and then use the Kallianpur-Striebel theorem. To do this, first assume that without loss of 
generality Rk = I- Define the function h : C x [0, T] —)■ M” by 

h{x, t) = ^(4+1 - tk) ^^^h{x o 7?(t), ?7(A))l{4<t<4+i}> 
k 

where y : [0, T] —)■ [0, T] is defined as 

y{t)=tk, Ytk<t<tk+i. 

Define the observation model Yt = /g h{Xs, s) ds + Bt. Notice that for each k, 

Yk+i -Yk = (4-1-1 - tk)^^‘^h{Xk,tk) + {B{tk+i) - B{tk)), 
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and hence 

(4+1 — 4) ^^‘^(Xk+i — Yk) = h{Xk, tk) + Pk, 

where pk Pk ~ AA(0, 1). In other words, (4+i — 4)~^^^(i^fc+i — Yk) Yk, and in this 
sense the discrete measurement model can be subsumed in the observation model given by 
Yt = /o hiXs, s) ds + Bt. 

Notice that by the definitions of Y and h, the exponent in Kallianpur-Striebel formula 
is given by 

£l\\HX,s)fds - l\x,s)dY{s)=Y^^^\\h{Xk,tk)f - 

^ /I \ 

'^=Yliy-^\\^iXk,tk)\f-Y^h{Xk,tk)j , 

which leads to (38). Therefore, in this case the objective function in Problem 3 can be 
expressed as 

D(Q||npHor) + EQ[FT(-,y)]=^\ ^||u(Xt,t)-/(Xt)||^(^,) + 4^ot) dt, (39) 
where 

^ / 1 \ 
iiXt,t) = Y, (yjRk^HXuti) --\\R^^h{Xi,ti)fU{t-u). (40) 

i=l ^ 

Section 4 is independent of the considered measurement model, and by following Section 5 
we arrive at an optimal control problem (28), where the cost functional is replaced by (39). 
The derivation of necessary conditions for global optimality of the optimization problem 
(28), compared to the continuous time measurement model, here is somewhat nonstandard, 
due to the Dirac delta terms (40) involved in the Lagrangian. However, the problem can 
be seen as an OOP with so-called intermediate constraints, for which an extension of the 
PMP is available Dmitruk and Kaganovich (2008). 

Assumption 21 Let the process ('2*(t), ^^*(t))te[o,r] be a local minimizer for the optimal 
control problem (28), that satisfies 

(i) Assumptions 15(i) and (ii); 

(ii) V* is measurable and essentially bounded. 

Lemma 22 (Extended PMP) Let the process (•2*(t), t’*(i))ie[o,T] be a local minimizer for 
the problem (28). Given Assumption 21, then there exists an absolutely continuous function 
p : [0,T] ^ Z satisfying 

1. the adjoint equation p{t) = —Vz {p{t), H{z*{t),v*{t))'j—YzL{t, z*{t),v*{t)) for almost 
all t G [0, T]; 

2. the transversality conditions p{ti) = p{t~) — Vz^Q[i{X,ti)~\ for i = and 

p{T) = 0; 
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3. the maximum condition 

{p{t),H{z*{t),v*{t))) + L{t, z*{t),v*{t)) 

= sup (^p{t), H{z*{t),v{t))') + L{t, z*{t),v{t)) for almost all t G [0,T]. 
veM{lO,T],V) 

Proof Follows directly from Dmitruk and Kaganovich (2008), when transforming problem 
(28) into an OOP with intermediate constraints. ■ 


Remark 23 1. Note that the data (measurements) enter the expression through the 

cost function, namely the term (40), which is nonzero only at measurement times 
{ti}^i and leads to jumps in the adjoint state. 

2. Lemma 22, basically leads to a boundary value problem, that provides necessary 
conditions for optimality of Problem 3. See Section 5.2 for a discussion about how 
to numerically solve it. We refer to the numerical examples in Section 8 for the 
performance of such a solution. 

8. Simulation results 

In this section, we present two examples to illustrate the performance of the variational 
approximation method introduced. Both examples have important applications in mathe¬ 
matical finance. As a first example, we consider the geometric Brownian motion that we 
introduced as a running example in Sections 2.1, 4.3 and 5.3, The second example is con¬ 
cerned with the Cox-Ingersoll-Ross process, that is often used for describing the evolution 
of interest rates Cox et ah (1985). 

8.1 Geometric Brownian motion 

As presented in Sections 2.1, 4.3 and 5.3 we consider a one-dimensional geometric Brownian 
motion (GBM) (11) and assume that the available data are noisy observations {yk}k=i 
time tk, modeled by the observation process 


hfc — Xtf, + Pk, 


where {pk}^^i are i.i.d. normal random variables with zero mean, standard deviation R 
and tN = T. 

PDE approach. As explained in Section 7, the smoothing density can be characterized by 
(33) that is the (normalized) product of two densities w and p. The first density satisfies 
equation (35) with jump conditions (37) at the measurement times and terminal condition 
w{x, T) = exp marginals are shown in Figure la, The second density, 

called the filter density, is given by equation (34) with jump conditions (36) and initial 
condition p{x,0) = exp ^ ^ that is given by (11). Its marginals are shown 

in Figure lb. The smoothing density is depicted in Figure Ic as the solid line. 
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Variational approximation. Following Section 4.3, the drift function for the approximating 
SDE (15) has to be chosen as (24). The optimal control problem can be formulated along the 
lines of Section 5.3, choosing the discrete-time measurement setting presented in Section 7, 
Note that Assumption 21 can be easily verified to hold, if we restrict the optimizers in (28) 
to bounded controls. We solve the boundary value problem obtained from Lemma 22 under 
the assumption that the smoothing density at initial time is available, see Section 5.2 for 
a discussion about this assumption. The solution is depicted in Figure Ic as the dashed 
line. Finally, Figure Id shows the relative entropy between the marginals of the smoothing 
density obtained by the PDF approach and the variational method, and hence reflects the 
accuracy of the variational approximation. 

Parameter inference. We consider the case where the drift parameter k in (11) is assumed to 
be unknown. Figure le shows the performance of the EM -Algorithm introduced in Section 6 
for an initial guess kq = 4 of the unknown parameter. It can be seen that the estimator k is 
close to the true value of k = 1 indicating the efficacy of our algorithm. Also, the algorithm 
converges quite rapidly. 





(a) Density w{x,t) (b) Filter density p(a:, t) (c) Smooting density P5(a;, t) 




(d) Relative entropy 


(e) Parameter inference, k = 1.1867 


Figure 1: Geometric Brownian motion: Comparison of the PDE solution (solid) versus 
the variational approach (dashed). The considered numerical values are: k = 1, A = 0.1, 
R = 0.15, T = 0.2s, ^ = 0, 0- = 0.25, N = 4, ti = T/4, ts = T/2, tg = 3r/4 and U = T. 


8.2 Cox-Ingersoll-Ross 

Consider as underlying system a Cox-Ingersoll-Ross (CIR) process 


dw = k( 6 - w)dt + A^/Wdm, Ao = a:o~AA(/i,a), (41) 
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for 0 < t < T and assume that the available data are noisy observations {yk}k=i time 
tk, modeled by an observation process 

+ Pk ) 

where pk are i.i.d. normal random variables with zero mean, standard deviation R and 
tN = T. 

PDE approach. As in the GBM example 8.1, we solve the underlying PDEs introduced 
in Section 7, to characterize the smoothing density as the (normalized) product of two 
densities v and p. Figure 2a shows the marginals of the first density w, the hlter density p 
is depicted in Figure 2b and the smoothing density in Figure 2c as the solid line. 
Variational approximation. The variational approximation is derived similarly to the GBM 
example 8.1, where we chose a drift function for the approximating SDE (15), according to 
Theorem 9, as 

u(x, t) = + A{t) + B{t)x + }?x{C{t) + D{t)x). (42) 

The variational approximation to the smoothing density is depicted in Figure 2c as the 
dashed line. Finally, Figure 2d shows the relative entropy between the marginals of the 
smoothing density obtained by the PDF solution and the variational approximation. 



(a) Density w{x,t) (b) Filter density p(a::, t) (c) Smooting density Vs{x,t) 




(d) Relative entropy (e) Parameter inference, k = 1.4469 

Figure 2: Gox-Ingersoll-Ross: Gomparison of the PDF solution (solid) versus the variational 
approach (dashed). The considered numerical values are: A = 0.2, k = 1, b = 0.3, /i = 1, 
a = 0.1, R = 0.1, T = 0.3s, N = 2, ti = T/2 and t 2 = T. 

Parameter inference. We consider the case where the parameter n in the drift term of (41) 
is unknown. Figure 2e shows the FM-Algorithm introduced in Section 6 for an initial guess 
kq = 4: of the unknown parameter. It can be seen that the estimator k is close to the true 
value of K = 1 indicating the efficacy of our algorithm. Also, the algorithm converges very 
fast. 
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Table 1: Runtime comparison. The presented variational approach for approximating 
the smoothing density is compared with the standard PDE approach for the two examples 
in Sections 8.1 and 8.2. All simulations are performed on a 2.3 GHz Intel Core i7 processor 
with 8 GB RAM using Matlab. 



Geometric Brownian motion 

Gox-Ingersoll-Ross 

Forward PDE (34) 

2.02 s 

1.33 s 

Backward PDE (35) 

2.87 s 

2.38 s 

Boundary value problem 

0.10 s 

0.23 s 

PDE approach® 

4.89 s 

3.70 s 

Variational approach^ 

2.97 s 

2.61 s 


Table 1 summarizes the runtimes of the two numerical examples above. It can be seen 
that the boundary value problems provided by the maximum principle can be solved by 
roughly one magnitude faster than the backward PDE (35), that is the reason for the 
speedup of the variational approach compared to the PDE approach. Moreover, it is high¬ 
lighted that the main computational effort in the variational approach is needed to estimate 
the marginal smoothing density at initial time, which is done by solving the backward PDE 
(35). If, however, the backward density at initial time could be estimated in a more effi¬ 
cient way, e.g., by using an MCMC method, the proposed variational approximation method 
could be applicable to high-dimensional problems. 

9. Conclusion 

The paper is devoted to a variational method for estimating paths of a signal process in 
a hidden Markov model. In particular, this leads to approximations of smoothing den¬ 
sity which can be used to reconstruct any past state of the signal process given a full set 
of observations. A crucial fact that plays an important role in our method is that the 
smoothing distribution is induced by a posterior SDE which itself is a modification of the 
original signal process. The presented variational approach proposes an approximate SDE 
which minimizes the relative entropy between the posterior SDE and a class of SDEs whose 
marginals belong to a chosen mixture of exponential families. In the simplest case of nor¬ 
mal marginals and a posterior SDE with constant diffusion term, the approximating SDE 
consists of a linear drift and constant diffusion term, which is well known. It is shown that 
the prescribed approximation scheme can be formulated as an optimal control problem, and 
necessary conditions for global optimality are obtained by the Pontryagin maximum prin¬ 
ciple. The resulting numerical methods have considerable computational advantages over 
numerically solving the underlying (S)PDEs, that is highlighted by two examples. The de¬ 
veloped approximation scheme is then used for designing an efficient method for parameter 
inference for SDEs. 

For future work, as mentioned in Section 5.2, we aim to study how to efficiently estimate 
the backward density at initial time. Then, the presented variational approximation method 

6. consists of solving the two PDEs (34) and (35) 

7. consists of the PDE (35) and the boundary value problem in order to solve Problem 3 
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reduces to solving a PMP-shooting-type boundary value problem that is tractable even in 
relatively large dimensions, compared to PDEs. Additionally, it would be interesting to 
study numerical methods specifically tailored to the boundary value problems resulting 
from the maximum principle, such as the shooting method, see Stoer and Bulirsch (2002) 
for a comprehensive summary, as well as the approach of solving the optimal control problem 
via its weak formulation as pointed out in Remark 19. 

Our future projects will also delve into analyzing the convergence of the EM-type algorithm 
used for parameter inference as well as the properties of the obtained estimators. We will 
also focus on rehning the basic inference algorithm to get better efficiency and speed. One 
promising path to take in this direction would be designing of suitable adaptive EM-type 
algorithms. It is also conceivable that the ideas mentioned in the paper can be combined 
with suitable MCMC schemes or techniques known as Assumed Density Eiltering, see Harel 
et ah (2015), to get better accuracy and efficiency in high-dimensional models. 
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Appendix A. Derivation of Equation (10) 


We consider the one-dimensional case; an extension to the multi-dimensional case is straight¬ 
forward. According to (9) the smoothing density is given by Vs{x,t) = K{t)p{x,t)w{x,t), 
where Ar(t) ;= p{x,t)w{x,t)dx) The main idea is to recall that the process (A(t))jg[o^ 7 ’] 
is known to be almost surely constant (Pardoux, 1981/82, Theorem 3.2). Therefore 

^Vs{x,t) = K{t)p{x,t)^^w{x,t) + K{t)w{x,t)^^p{x,t) 


^ 'Ps{x,t) 
w{x, t) 

+ w{x,t) f- 


—f(x)w'(x,t) — ^a(x)w"(x,t) — w(x,t)h(x)^dYt 
f(x)'Ps(x,t)\' 1 fa{x)Vs{x)\" , Vs{x,t) 


w{x, t) 


+ 2 


w{x, t) 


+ 


w{x, t) 


-h{xydYt 


The proof follows by a straightforward computation. We compute in a preliminary step 
' f{x)'Ps{x,t)\' 1 


w{x, t) 


w‘^{x, t) 


and 


a{x)Vs{x,t) 
w(x, t) 

1 


w{x, t) 


{{f{x)rs{x,t)+f{x)rs{x,t)) w{x,t) -f{x)rs{x,t)v'{x,t)), 


(a"{x)Vs{x, t) -h 2a{x)V's{x, t) + a{x)Vs{x, t)) 


(2w'{x, t)a'{x)Vs{x, t) + 2w'{x, t)a{x)V'g{x, t) +a{x)Vs{x, t)w"{x, t)) 


w‘^{x, t) 

+ 3 / {2a{x)Vs{x,t)w {x,tf) . 

Using this two preliminaries, we get 

= -f'{x)Vs{x, t) - f{x)V's{x, t) + a'{x)V's{x, t) + {x)Vs{x, t) + a{x)Vs{x, t)) 


1 


(a'{x)w'{x, t)Vs{x, t) + a{x)w"{x, t)'Ps{x, t) + a{x)w'{x, t)V's{x, t)) 


w{x, t) 

+ 2 / .M x)w'{x,tfVs{x,t) 

W^[x, tj 


= -(f'ix)'Psix,t) + f{x)V's{x,t) + / .A a'{x)w'{x,t)Vs{x,t) 

\ W{X, t) ^ 


w{x, t) 

+ a{x)w''{x, t)Vs{x, t) + a{x)w'{x, t)Vs{x, t)) — 
+ ^ {a{x)rsix, t) + a{x)V's{x, t))' 


w^{x, t) 


{a{x)w'{x,tfVs{x,t)) 


= - fix) a(x) 


w'{x, t) 
w{x, t) 


Vsix.t) ) -{a{x)Vsix,t)) 


(^f (x) + a{x) {log w{x,t)y^ Vsix^t)^ + -{a{x)'Psix,t))”, 
and as such ( 10 ) holds. 
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Appendix B. Proof of Theorem 9 

Consider an arbitrary curve 1 1 —)■ p{-, • • •, evolving in ..., Define a 

diffusion 

dZt = u{Zt, t)dt + a{Zt)dBt, Zq = xq , 

with the given diffusion coefficient a(-) = cj(-)cj(-)^. Clearly the density of Zt coincides 
with p(-, , ©1^^) if and only if p{-, ©|^\ ..., ©|^^) satisfies the Kolmogorov forward 

equation for Zt, i.e., 


dp{x,G[^\...,e'i''’) 
dt 




d 


2=1 




( 1 ) 




n n „2 

+ ^EE 


i=i j=i 


dxidxj 


aij(x)p(x,©f\...,©f^)^ . 


(43) 


We will show this in two steps that (43) holds for the proposed drift term. Consider the 
decomposition Ui{x, t) = gi{x, t) + 7 i(x, t) for all i = 1 ,..., n, where 


1 ” d 1 

9i{x,t) ■= 2 YI 


—n(x 


i=i 


and 


7 i(x,t) := 


-1 


i=i 

k 


p(x,©s'\...,©r) 




p(x,©J^\...,©f^) t=l 


'^uePe{x,0f^) (Qf\xf\x)^. 


(44) 

(45) 


We use the shorthand notation p{x, ©J ' ) ;= p{x, Q\ ,..., ©J ). 
Claim 24 The functions gt defined in (44) for all i = 1,... ,n satisfy 


d 

[ 9 i{x,t)p{x,Q\ 

2=1 


^ n iL 

2 i=i j=i 


''aij{x)p{x,e[^'^^ 


dxidxj 


Claim 24 follows from a straightforward computation, see Appendix B in the extended 
version Sutter et al. (2015) for a detailed derivation. 

Claim 25 The functions 7 * defined in (45) for all i = 1,... ,n satisfy 

^p(x, ©f ^^) = - ^ ^ (liix, t)p{x, . 

i=l ^ '' 

Proof. 

^-p(x,©f^^^) = ^i/f^p^(x,©f^) = ^ne(^Q[^\c^^'>ix)-V0fie{ei^'^)\pe{x,eP). 


dP 


Moreover, 

d 


e=i 


1=1 




d 


p(x, ©j^'^^)2 V<9x,: 


p(x,©f^^M '^vtpi{x,Qf^) (Gf\xf\x) 


1=1 
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——^ ©f^) exp[/0f\ c(^)(x)\ - V;£(0f^)]\ , 

P(a;,©r )£=i 


where we used 

pe{x,eP)(ef\il^'> 


= {^^t\ j - V’f(0f^)]dCi 

Therefore, 

= -^^e ©f^)) P£(x, ©f ^), 


e=i 


and 


E i: (7.(^. tUx, 0["‘')) = E ef’) (E (ef*. a”(-. ef’) 


9x,' 


j=l ^ ^ £=1 \j=l 

k 


£=l 

where we used (18). 


(£).\ d 




= g^pU.e, „ 


The two claims imply (43) and hence complete the proof. 


Appendix C. Proof of Proposition 11 


The proof basically requires Theorem 9 and two additional lemmas. We first propose in 
Lemma 26 a choice of functions ipl ^ that satisfy (18) in Theorem 9, Then we show in a 
second step, in Lemma 27, that for this choice the integral terms (17) admit a closed form 
expression. We start with a few preparatory results that are needed to prove Proposition 11, 
Note that Ve'0(©) = (V^'0(©), V6u/'(©)) G V. and recall that according to (Bernstein, 2009, 
p.631) for A G B G and A G GL(n,M) g^tr {AX-^B) = -X-^BAX-^ and 

g^logdet (^AX~^B^ = —X~^B{AX~^B)~^AX~^ and therefore 


v.V’C©) = 


Ve^l^iO) = e 


-1 


-m 


- ^\r 


(46) 


Lemma 26 Consider the functions Lpf^ : x % ^ where iff"’ = 'xith 

iff} :R^xn^R^ and ■.R'^x'H^ R^^^ for i = 1,..., k. Let pf] and (/jg he defined 
as 








-I 


E,9, 




(4 


(x_i,ei)-v^V’K0f)) 
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1 


-1 


Then, (18) holds for all i = 1,... ,k. 

Proof of Lemma 26. According to (19) we have 


n n 

Y: (ef>.d'>U.eS'')) = E ((>i!'’.d1P.ef’)) + («!'>. A.’(-.ef'))) • 


2=1 2 = 1 

consisting of the two components 


n ^ / 1 \ 

E ef)) = E (vP.oP'e4’\c[‘\.) - v,*(e|''))) 

i=l i=l ' ' 


= ( E 0^''^~'E^9i^\c?{x) - )) ) = {4'\cf\x) - 


2=1 


and 


E {0i'\Ti%, ©f^)) = {0 ?,E ( E^0f\ci^\x) - v.V'Kef^)) 

i=l \ i=l ^ 

= {0l^\c2\x) - VeM^P)) - \ (0f\xrif^^ 9f'’ ^ (0^,0? 

= (0 f,cf(x)-V,V’r(0f^)), 


where we have used in the last step that (^f\xrif^ 9^^ ) = \ 9f\9f^ rlpx^), since 

for A G Sym(n,M) and B G tr(Ai?^) = tr(Ai3). 


Lemma 27 For i = 1,..., n, j = 1, 2 and £ = 1,... ,k eonsider 


TfJ{si,x):= I ip^ll{{x-i,Ci),e 


W/ 

j,i 


where the functions (pfj are chosen according to Lemma 26. Then, 




Qf\c^‘^\x-i,ii) - 


J:* 


Tf}{si,x) = ^6lf^ Ciexp {(Qf\c^^\x-i,Si) - c^^\x)y^ 
{xi,x) = j9f^ ^Ci (20f^X - r]f^)^^. 
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Note that zf\x) = {zf\{xi,x),Z^l{xi,x)), where zf\x) is the function defined in (17) 
and Cj denote the canonical basis vectors of EZ. 

Proof of Lemma 21. 

= j Ei20^/\cf\x-i,^i) - V^V’<?(0f^))exp (^(^e[^\c^^\x-i,fi) - dfi 

= J exp d^i 

= j Ei (2el^\x-i,^i) + exp d^i, 

where (46) was used. Consider the substitution z := (^&[^\ c^^\x-i,fi) — c^^\x)^ that leads 


z[^J{si,x) = Cjexp (/0f\c(^)(x_i,Si) - c 


(£) (t) 1 

where we used that 6l P 0, since 6l = —^Sl and the invese of a negative dehnite 
matrix is negative dehnite. For the second integral term 

2^2?(a^) = j V?^]((a;-*,C*)>0f^)exp d^i 

= r EjAe?{c^\x.,,^,) - VeMQV))e? - " 


+ ‘^'nf\x-hiiy exp (^(|0f\c(^)(x_i,^i) - dCi 

- 2ef\x-i,ii)'qf'^ + 2rif\x-i,fiyepef^ exp [(Qf\c^^\x-i,ii) - c^^\x)'yj d^i 


where we have used (46). By expanding terms and using integration by parts together with 
the hrst assertion of this lemma 

= \J pP Ei{2df^{x-i,if) + r/f^)(2(9f^(x_i,Ci) - 

exp - c(^)(x)^) d^i 

e\J Fiiexp (^<^0f\c(^)(x_i,^*) - d^i 

— .y,iZ\TnW ^ ^ f -riZu. .r.\foa(Z„.\TaiZ 


= ;^ zi : ux ., x ) i 2 eyx-yyyer 


z\‘^>y„x)i2eye,yey de. 


+ ^/ ^Eiexp(^(^el^\c^^\x-i,^i) - c^^\x 
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-Ut ei[2(J^ x-TJi- ) 


-1 


^ e^exp - c^ 


' —OO 

1 n 


ej2ef^ef^ 'de. 


+ X 


gW 


-1 


^^jexp ilQf\c^‘^\x-i,ii) - d 


7 ^t x-r]^ ) 


Proof of Proposition 11 We decompose the function Ui{x,t), given by Theorem 9 into 
Ui{x, t) = gi{x, t) + 7 i(x, t) for alH = 1 ,..., n, where 


1 A d 




i=i 


7i(x,t) : = 


-1 


i=i 

k 


gg-p(x, 0 r\..., 0 f^) 

p{x,el^\...,eP) 


^ (x, 0f ^) (©f \ xP ^ 


p(x, 0 f\..., 0 f^) 

As a preliminary step by invoking Lemma 27 

+ (e'Pplix)) 
+ 'ciY - iej'* 


= (ef j - jtr (^«!79!-' "e.7' »!" 9’ 

= + luo7-(i,'7 - 


1 


Therefore, 


= ~ , n(i) -^ ^ 

p(x, 0 ^ \ ^=1 




Lr'-'fU-- , 


Furthermore, 


9x 


-p(x,0j^\...,0f^) = ^ \ exp[(0f\ c(^)(x)') - V>£(0f^)] 


f=i 


= ^U£(^0f\ (ej,ejx^ + xej)^exp 
£=i 

= + 2ej0f^x) exp (^^0f\ c(^)(x)^ - , 


£=i 
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and therefore 


g{x,t) = ^div(a(x)) + ^ 


Note that our choice of satisfy (18) as shown in Lemma 26, which then completes the 
proof. ■ 


ELi Qf^)a{x) + 2(9fE) 


p{x, 0 


( 1 ) 
t ’ ' 


,0 


(fc)^ 


Appendix D. Proof of Theorem 13 

Lemma 28 For an SDE of the form (15) the mean mt and covarianee matrix St of Xt 
satisfy 

dmt = E[R(Ai,t)] dt, 

dSt = {E[Xtu{Xt,ty] +E[u{Xt,t)Xf] +E[a{Xt)a{Xty] ( 47 ) 

-mtE [u{Xt, t)] ^ - E [u{Xt, t)] mj ^ dt. 

Proof. The equation for the mean is trivial. For the variance let Yt := XtXj. Accord¬ 
ing to Ito’s Lemma 0ksendal (2003) dY) = Xt{u{Xt,t)dt + (j{Xt)dBty + (u(At,t)dt -|- 

a{Xt)dBt)XJ+a{Xt)a{Xty dt,aiids\mi\ai\Y dmf = (^mtE[u{Xt,t)]^ +E[R(Ai,t)] mj^ dt. 
Hence, 

d5(t) =E[dy(t)] - dm? = {E[Xtu{Xt,ty] +E[u{Xt,t)Xy +E[aiXt)a{Xty] 

—mtE [u{Xt, t)] ^ — E [u{Xt, t)] mJ ^ dt. ■ 


Lemma 29 Mean mt and varianee St satisfy 
k k k 


mt = ^ Vf^ml 
t=i 


(^) 


St = Y.n,s[^^ + Y.-^m?-^f 

t=l £=1 


\e=i 




\e=i 


(P 


Proof. The statement for the mean is straightforward. For the variance, 

St = Yl n/Ep^XXy - fX] 

£=i \e=i J V^=i / 

e=i ^ ^ £=i V£=i / Vt=i / 


Proof of Theorem 13 Consider a drift function u{x,t) given by (20). In view of Lemma 


28 

dmt 

dt 


YM^Pt Y^iv(a(A)) + Af) mf) +Ep^[a(A)] cf) + EpJa(A)Zlf a' 


£=1 
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which can be simplified according to Lemma 29, such that 


fc , (£) k 

E arn] 


r=i 


dt 




e=i 


+ A 




+ + EpJa(X)] cf^ + ¥.Aa{X)Df^X 


(48) 


Note that (48) has to hold for all > 0 such that = 1- Therefore 


dm 


(^) 


-B— = E 
dt 


-div(a(X)) 


+ Af + B^mf + EpJa(X)] a{X)Df^X . (49) 


(«) 




For the variance, we have according to Lemma 29 


dSt f dS^^^ dmj^^ (gt (£) 


dm 


(£)' 


dm 




dm) 


(£)' 


dt 

e=i 

This implies 


dt 


dt 


dt 


-m^ — mt 


dt 


^ dS* ^ f dm^p ^ mr {£)\ f dm|^^ 


r=i t=i 

where ^ is given according to Lemma 28, by 


dt 


^ =E[Xtu{Xt,ty] +E[u{Xt,t)X;] +E[a(At)c7(X4)^] - mt 


dmt 

dt 




Therefore, 


k ^ (r) 


=E[Xtu{Xt,ty] +E[u{Xt,t)Xp +E[a{Xt)a{Xty 


£=l 


£=1 


dm 




dt 


-m; + m- 


(B 


dm 


(i)' 


dt 


Recall that 


dm 


(^) 


dt 


is given by (49). Next, we compute 


(50) 


E[Xu{X,ty]=Y^e(^Pt 


1=1 
+ E, 


^Adiv(a(X)) 


Ve 


xcp\ix) 


+ E, 


Pe 


+ mPAPY{mPmPYsP)BP^ 


xx^oPyx) 


E[u{X,t)Xy=Y^eKt ^div(a(X))X" + A^ mf E i?f^(mf mf^E 5^) 

r=i ^ L J 


+ E, 


Pe 


a{X)cPx 


+ E, 


Pi 


a{X)DPxX 
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k 

Ej^a{X)a{Xy] = Ep[a(X)] = u,Ep^[a(X)] , 

e=i 

such that by evaluating (50) and by recalling that it has to hold for all convex combinations, 
we get the assertion (23). ■ 
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